Day 12. Self Attention: 從做 LLM 中看注意力機制

2025 iThome 鐵人賽

DAY 12

生成式 AI

LLM 學習筆記 - 從 LLM 輸入問題，按下 Enter 後會發生什麼事？系列第 12 篇

17th鐵人賽

minw

2025-09-26 23:53:33

25 瀏覽

分享至

繼前一篇我們將分詞可以轉為能對應到想要的維度 embedding。

Self Attention: 簡化版的注意力機制

前情提要

我們將 token 轉為某種數字，但這個數字還只是單純文字跟句子中的位置的數字，我們需要讓文字可以引含上下文的意義在其中 e.g. 「寫」來說，「寫」生是繪畫、「寫」作是文字創作，雖然都是「寫」，但實際上語意卻相去甚遠

為了知道每一個 token embedding 在上下文中的關係，我們需要讓每個字與各個字之間建立聯繫。

import torch

inputs = torch.tensor(
  [[0.43, 0.15, 0.89], # Your     (x^1)
   [0.55, 0.87, 0.66], # journey  (x^2)
   [0.57, 0.85, 0.64], # starts   (x^3)
   [0.22, 0.58, 0.33], # with     (x^4)
   [0.77, 0.25, 0.10], # one      (x^5)
   [0.05, 0.80, 0.55]] # step     (x^6)
)

先建立一個 Your journey starts with one step 的隨機權重矩陣，接著以第二個字為例，計算字與字之間的內積。

query = inputs[1]  #取第二個字出來

attn_scores_2 = torch.empty(inputs.shape[0]) #建立第二個字的注意力分數
for i, x_i in enumerate(inputs):
    attn_scores_2[i] = torch.dot(x_i, query) #計算數字二與所有文字各自的內積

最後將這些內積做 normalize，讓所有值可以加總起來為 1，這個機制設計是為了讓值與值之間可有可比性，就像是一個縮放機制。

在實務上，最常 normalize 的手法是 softmax，softmax 會投射出單純值（不會有負數），在處理值中如果有極端值分布時，會放大相對差距，讓整個後續在最佳化中，數學意義上更平滑，計算上更有利。

#自行實做一個 softmax 
def softmax_naive(x):
    return torch.exp(x) / torch.exp(x).sum(dim=0)
#或 pytorch 也有提供 torch.softmax(attn_scores_2, dim=0)
attn_weights_2_naive = softmax_naive(attn_scores_2)

最後將整體加總，計算出第二個字相對於其他字的注意力分數總和。

query = inputs[1]

context_vec_2 = torch.zeros(query.shape)
for i,x_i in enumerate(inputs):
    context_vec_2 += attn_weights_2[i]*x_i

以上只是其中一個字的計算流程，而實際上，我們需要對每個字都計算出相關的數字，在數學意義上，也就是做矩陣乘法：

attn_scores = torch.empty(6, 6)

for i, x_i in enumerate(inputs): #雙重迴圈跑過每個數字
    for j, x_j in enumerate(inputs):
        attn_scores[i, j] = torch.dot(x_i, x_j)

#或者直接使用矩陣乘法，效能會更好:
attn_scores = inputs @ inputs.T

同樣的對計算完的結果進行 normalize，最後加總：

attn_weights = torch.softmax(attn_scores, dim=-1)
all_context_vecs = attn_weights @ inputs

Day 11. Embedding: 從做 LLM 中看怎麼將文字轉向量

Day 13. Scaled Dot-Product Attention: 從做 LLM 中看 query, key & value weight

系列文

LLM 學習筆記 - 從 LLM 輸入問題，按下 Enter 後會發生什麼事？共 24 篇

RSS系列文訂閱系列文

0 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

17229 篇

完賽人數

204 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 17th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

關於powershell dns policy 同步問題

IT邦幫忙

LLM 學習筆記 - 從 LLM 輸入問題，按下 Enter 後會發生什麼事？系列 第 12 篇